DalTREC 2005 Spam Track: Spam Filtering Using N-gram-based Techniques
نویسندگان
چکیده
We briefly describe DalTREC 2005 Spam submission. DalTREC is the TREC research project at Dalhousie University. Four packages were submitted and they resulted in a median performance. The results are interesting and may be seen positive in the light of simplicity of our approaches.
منابع مشابه
An Adaptive Approach to Spam Filtering on a New Corpus
Motivated by the absence of rigorous experimentation in the area of spam filtering using realistic email data, we present a newly-assembled corpus of genuine and unsolicited (spam) email, dubbed GenSpam, to be made publicly available. We also propose an adaptive model for semi-structured document classification based on smoothed n-gram language modelling and interpolation, and report promising ...
متن کاملSpam Filtering Using Character-Level Markov Models: Experiments for the TREC 2005 Spam Track
This paper summarizes our participation in the TREC 2005 spam track, in which we consider the use of adaptive statistical data compression models for the spam filtering task. The nature of these models allows them to be employed as Bayesian text classifiers based on character sequences. We experimented with two different compression algorithms under varying model parameters. All four filters th...
متن کاملNaive Bayes Spam Filtering Using Word-Position-Based Attributes
This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using wordposition-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms o...
متن کاملYork University at TREC 2005: SPAM Track
We propose a variant of the k-nearest neighbor classification method, called instance-weighted k-nearest neighbor method, for adaptive spam filtering. The method assigns two weights, distance weight and correctness weight, to a training instance, and makes use of the two weights when classifying a new email. The correctness weight is also used in the maintenance of the training data to make the...
متن کاملNaive Bayes spam filtering using word-position-based attributes and length-sensitive classification thresholds
This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using word-position-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005